Correction of Errors in a Modality Corpus Used for Machine Translation Using Machine-learning
نویسندگان
چکیده
We performed corpus correction on an annotated corpus for machine translation using machine-learning methods such as the maximum-entropy method. We thus constructed a high-quality annotated corpus based on corpus correction. We compared several di erent methods of corpus correction in our experiments and developed a suitable method for correction. Recently, corpus-based machine translation has been investigated. Since corpus-based machine translation uses corpora, the corpus correction we discuss in this paper should prove to be signi cant.
منابع مشابه
Correction of Errors in a Modality Corpus Used for Machine Translation by Using Machine-learning Method
We performed corpus correction on a modality corpus for machine translation by using such machine-learning methods as the maximum-entropy method. We thus constructed a high-quality modality corpus based on corpus correction. We compared several kinds of methods for corpus correction in our experiments and developed a good method for corpus correction.
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings
English as a Second Language (ESL) learners’ writings contain various grammatical errors. Previous research on automatic error correction for ESL learners’ grammatical errors deals with restricted types of learners’ errors. Some types of errors can be corrected by rules using heuristics, while others are difficult to correct without statistical models using native corpora and/or learner corpora...
متن کاملDetection of Glioblastoma Multiforme Tumor in Magnetic Resonance Spectroscopy Based on Support Vector Machine
Introduction: The brain tumor is an abnormal growth of tissue in the brain, which is one of the most important challenges in neurology. Brain tumors have different types. Some brain tumors are benign and some brain tumors are cancerous and malignant. Glioblastoma Multiforme (GBM) is the most common and deadliest malignant brain tumor in adults. The average survival rate for peo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002